在 Elixir 中进行并发架构设计,远不止于简单的进程创建;它需要一种严谨的方法论来实现 99.9999999% 的可靠性 (九个九)。这相当于大约 每 30 年约 1 秒的停机时间。为了达到这一标准,我们采用 五问框架。
结构化启发法
在编写任何一行 OTP 代码之前,请使用以下问题将有状态的问题分解为可管理的基本单元:
- 环境与约束: 是单节点?还是全局集群?内存或 I/O 有哪些限制?
- 核心点: 数据存储在哪里?谁“拥有”状态(例如,结果账本)?
- 运行时特性: 有多少并发请求?它们是计算密集型还是 I/O 密集型?
- 保护机制: 哪些状态必须存活?哪些可以容忍丢失并重新启动?
- 初始化: 我们如何初始化系统树?哪些服务依赖于其他服务?
通过将这些问题视为约束条件,你可以避免出现‘大泥球’式并发架构——即每个进程都与其他所有进程直接通信,缺乏清晰的层级结构。
main.py
TERMINALbash — 80x24
> Ready. Click "Run" to execute.
>
QUESTION 1
Regarding the Five-Question Framework: What is the 'environment' and what are its constraints?
The local IDE settings and compiler versions.
The hardware/OS context, such as memory limits or if it's a distributed cluster.
✅ Correct!
Correct. This defines the physical and logical boundaries of the system.❌ Incorrect
The environment refers to the runtime infrastructure (BEAM nodes, CPU cores, IO limits).QUESTION 2
What are the 'obvious focal points' in a system?
Discrete responsibilities or modules that own specific state.
The syntax errors highlighted by the compiler.
✅ Correct!
Exactly. Focal points are structural 'anchors' for state management.❌ Incorrect
Focal points represent architectural responsibilities like results aggregation or work distribution.QUESTION 3
What does 'What do I protect from errors?' primarily help you define?
Password hashing algorithms.
Supervision boundaries and restart strategies.
✅ Correct!
Yes. It helps you decide what is precious (must survive) and what is disposable (can restart).❌ Incorrect
In OTP, protection refers to fault-tolerance and shielding state from process crashes.QUESTION 4
Which question addresses the initialization order of the supervision tree?
How do I get this thing running?
What are the runtime characteristics?
✅ Correct!
This question ensures that dependencies (like a database or a stash) are ready before workers start.❌ Incorrect
Runtime characteristics focus on load and performance, not the boot sequence.QUESTION 5
What is the 'Nine Nines' standard equivalent to?
1 minute of downtime per year.
Roughly 1 second of outage every 30 years.
✅ Correct!
This high bar is why OTP's supervision trees and fault-tolerance are so critical.❌ Incorrect
Nine Nines is 99.9999999%, which is far more stringent than standard industry 'five nines'.Case Study: The Duper Duplicate Finder
Applying the Framework to a File Auditing Tool
You are building 'Duper', an application that crawls a filesystem to find duplicate files based on their hashes. It must handle massive directories without crashing or leaking memory.
Q
1. Analyze the environment constraints for Duper. What is the primary bottleneck?
Solution:
The primary constraints are Filesystem IO and Memory. We cannot load 100,000 file paths into memory at once; thus, we need a stream-based or 'hungry consumer' approach.
The primary constraints are Filesystem IO and Memory. We cannot load 100,000 file paths into memory at once; thus, we need a stream-based or 'hungry consumer' approach.
Q
2. In Duper, what focal point is 'precious' and must be protected from worker crashes?
Solution:
The Results aggregator. While a worker hashing a corrupted file might crash, the collection of results found so far must be preserved in a supervised server that is decoupled from the workers.
The Results aggregator. While a worker hashing a corrupted file might crash, the collection of results found so far must be preserved in a supervised server that is decoupled from the workers.